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IMPROVING YOUR TEST QUESTIONS 



Choosing the appropriate t.ype of test item to measure students' 
understanding of course material and their achievement of course goals 
can often be as difficult a task as writing the item themselves. The 
purpose of this booklet is (a) to inform you of the uses, advantages and 
limitations of the various item types and (b) to help you develop specific 
skills in writing each kind of item. 

The booklet is divided into the following sections: 

Page 



I. Choosing Between Objective and Subjective Test Items .... 2 

II. Suggestions for Using and Writing Test Items 

Multiple-Choice 7 
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Student Evaluation of Test Item Quality 34 

IV. Assistance Offered by the Office of Instructional 

and Management Services (IMS) 35 

V. References for Further Reading 35 
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I. CHOOSING BETWEEN OBJECTIVE AMD SUBJECTIVE TEST ITEMS 

There are two general categories of test items: (a) objective items which 
require students to select the correct response from several alternatives 
or to supply a word or short phrase to answer a question or complete a 
statement; and (2) subjective or essay items which permit the student to 
organize and present an original answer. Objective items include multiple- 
choice, true-false, matching and completion, while subjeccive items include 
short-answer essay, extended-response essay, problem solving and performance 
test items • For some instructional purposes one or the other item types 
may prove more efficient and appropriate. To begin our discussion of the 
relative merits of each type of test item, test your knowledge of these two 
item types by answering the following questions. 



Test Item Quiz 



(circle the correct answer) 



1, Essay exams are easier to construct than 

are objective exams, T F ? 

2, Essay exams require more thorough student 
preparation and study time than objective 

exams, T F ? 

3, Essay exams require writing skills where 

objective exams do not, T F ? 

4, Essay exams teach a person how to write, T F ? 

5, Essay exams are more subjective in nature 

than are objective exams, T F ? 

6, Objective exams encourage guessing more 

so than essay exams, T ? ? 

7, Essay exams limit the extent of content 

covered, T F ? 

8, Essay and objective exams can be used to 

measure the same content or ability, T F ? 

9, Essay and objective exams at both good ways 

to evaluate a student* s level of knowledge, f F ? 
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Quiz Answers 



1. TRUE - Essay Items are generally easier and less time consuming to 
construct than are most objective test items. Technically 
correct and content appropriate multiple-choice and true-false 
test items require an extensive, amount of time to write and 
revise. For example, a professional item writer produces only 
9-10 good multiple-choice items in a day's time. 



2. ? - According to research findings it is still undeteruiined whether 
or not essay tests require or facilitate more thorough (or even 
different) student study preparation. 



3. TRUE - Writing skills do affect a student's ability to communicate 
the correct "factual" information through an essay response. 
Consequently, students with good writing skills have an 
advantage over students who have difficulty expres:-ing 
themselves through writing. 



4. FALSE - Essays do not teach a student how to write but they can emphasize 
the importance of being able to communicate through writing. 
Constant use of essay tests may encourage the knowledgeable but 
poor writing student to improve his/her writing ability in order 
to improve performance. 



5. TRUE - Essays are more subjective in nature due to their susceptibility 
to scoring influences. Different readers can rate identical 
responses differently, the same reader can rate the same paper 
differently over time, the handwriting, n6 \tness or punctuation 
can unintentionally affect a paper's grade and thci lack of 
anonymity can affect the grading rocess. While impossible to 
eliminate, scoring influences or biases can be minimized 
through procedures discussed later in this booklet. 



6. ? - Both item types encourage some form of guessing. Multiple-choice, 

true-false and matching items can be correctly answered through 
blind guessing, yet essay items can be responded to satisfactorily 
through well written bluffing. 

7. TRUE - Due to the extent of time required by the student to respond to 

an essay question, only a few essay questions can be included 
on a classroom exam. Consequently, a larger number of objective 
items can be tested in the same amount of time, thus enabling the 
test to cover more content. 
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8. TRUE - Both item types can measure similar content or learn^^^ 

objectives. Research has shown that students res^-ond almost 
identically to essay and objective test items covering the 
same content. Studies^- by Sax & Collet (1968) and Paterson 
(1926) conducted forty-two years apart reached the same 
conclusion: 

"•••there seems to be no escape from the conclusions that 
the two types of exams are measuring identical things •" 
(Paterson, p • 2A6) 

This conclusion should not be surprising; afterall, a well 
written essay item requires that the student (1) have a 
store of knowledge, (2) be able to relate facts and principles, 
and (3) be able to organize such information into a coherent 
and logical written expression , whereas an objective test item 
requires that the student (1) have a store of knowledge, (2) be 
able to relate facts and principles, and (3) be able to organize 
such information into a coherent and logical choice among 
several alternatives • 



9. TRUE - Both objective and essay test items are good devices for 

raeasuring student achievements However, as seen in the previous 
quiz answers, there are particular measurement situations where 
one item type is more appropriate than the other • Following is 
a set of recommendations fo^ using either objective or essay 
test items: (Adapted from Robert !• Ebel, Essentials of 
Educational Mec c >arement , 1972, p • 144) • 

TOEN TO USE ESSAY OR OBJECTIVE TESTS 
Essiy tests are especially appropriate when: 

— the group to be tested is small and the test is not to be reused. 

— you wish to encourage and reward the development of student skill in writing* 

— you are mere interested in exploring the student's attitudes than in 
measuring iiis/her achievement. 

— you are more confident of your ability as a critical and fair reader than 
as an imaginative writer of good objective test items. 



Gilbert Sax and LeVerne S. Colleu, "An Empirical Comparison of the Effects 
of Recall and Multiple-Choice Tests on Student Achievement," Journal of 
educational Measurement y vol. 5 (1968), 169-73. 

Donald Paterson, "Do New and Old Type Examinations Measure Different 
Mental Functions?" School and Society, vol. Ik. (August, 21, 1926), 246-48. 
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Objective tests are especially appropriate when: 

— the group to be tested is large and the test may be reused. 

— highly reliable test scores must be obtained as efficiently as possible. 

— impartiality of evaluation, absolute fairness, and freedom from possible 
test scoring influences (e.g., fatigue, lack of anonymity) are essential. 

— you are more confident of your ability to express objective test items 
clearly than of your ability to judge essay test answers correctly. 

— there is more pressure for speedy reporting of scores than for speedy 
test preparation. 

Either essay or objective tests can be used to: 

— measure almost any important educational achievement a written test can 
measure. 

— test understanding and ability to apply principles. 

— test ability to think critically. 

— test ability to solve problems. 

— test ability to select relevant facts and principles and to integrate them 
toward the solution of complex problems. 

In addition to the preceding suggestions, it is important to realize 
that certain item types are better suited than others for measuring particular 
learning objectives. For example, learning objectives requiring the student 
to demonstrate or to show , may be better measured by performance test items, 
whereas objectives requiring the student to explain or to describe may be 
better measured by essay test items. The matching of learning objective 
expectations with certain item types can help you select an appropriate kind 
of test item for your classroom exam as well as provide a higher degree of 
test validity (i.e>, testing what is supposed to be tested). To further 
illustrate, several sample learning objectives and appropriate test items 
are provided on the following page. 
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Learning Objective 



The student will be able to 
categorize and name the parts 
of the human skeletal system. 



Most Suitable Test Item 



Objective Test Item 
(M-C, T-F, Matching) 



The student will be able to 

critique and appraise another Essay Test Item 

student's English composition (Extended-Response) 
on the basis of its organization. 



The student will demonstrate 

safe laboratory skills. Performance Test Item 



The student will be able to 

cite four examples of satire Essay Test Item 

that Twain uses in Huckleberry Finn , (Short-Answer) 



After you have decided to use either an objective, essay or both objective 
and essay exam, the next step is to select the kind(s) of objective or essay 
item that you wish to include on the exam. To help you make such a choice, the 
different kinds of objective and essay items are presented in the following 
section of this booklet. The various kinds of items are briefly described and 
compared to one another in terms of their advantages and limitations for use. 
Also presented is a set of general suggestions for the construction of each 
item variation. 
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II. MULTIPLE-CHOICE TEST ITEMS 

The multiple-choice item consists of two parts: (a) the stem, which 
identifies the question or problem and (b) the response alternatives. 
Students are asked to select the one alternative that best completes 
the statement or answers the question. For example, 

Sample Multj.ple-Cho Lee Item 

(a) Item Stem: l^ich of the following is a chemical change? 

(b) Response Alternatives : a. Evaporation of alcohol 

b. Freezing of water 

Burning of oil 
d. Melting of wax 

*correct response 

Advantages in Using Multiple-Choice Items 

Multiple-choice items can provide ... 

... versatility in measuring all levels of cognitive ability. 

... highly reliable test scores. 

... scoring efficiency and accuracy. 

... objective measurement of student achievement or ability. 

... a wide sampling of content or objectives. 

... a reduced guessing factor when compared to true-false items. 

... different response alternatives which can provide diagnostic feedback. 



Limitations in Using Multiple-Choice Items 

Mutliple-choice items ... 

... are difficult and time consuming to construct. 

... lead an instructor to favor simple recall of facts. 

... place a high degree of dependence on the student's reading ability 
and instructor's writing ability. 

ER?C II 
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SUGGESTIONS FOR WRITING MULTIPLE-CHOICE TEST ITEMS 



The Stem 

!• VJhen possible, state the stem as a direct question rather than as an 
incomplete statement. 

Undesirable; Alloys are ordinarily produced by . . . 
Desirable: How are alloys ordinarily produced? 

2. Present a definite, explicit and singular question or problem in the stem. 
Undesirable: Psychology • • . 

Desirable: The science of mind and behavior is called ..• 

3. Eliminate excessive verbiage or irrelevant information from the stem. 

Undesirable: While ironing her formal, Jane burned her hand accidently 
on the hot iron. This was due to a transfer of heat by . . . 

Desirable: Whi^h of the following ways of heat transfer explains why 
Janets hand was burned after she touched a hot iron? 

4. Include in the stem any word(s) that might otherwise be repeated in each 
alternative. 

Undesirable: In national elections in the United States the President 
is officially 

a. chosen by tk^ people. 

b. chosen by members of Congress, 

c. chosen by the House of Representatives . 
"^d. chosen by the Electoral College. 

Desirable: In national elections in the United States the President 
is officially chosen by 

a. the people. 

b. members of Congress. 

c. the House of Representatives, 
^d. the Electoral College. 

5. Use negatively stated stems sparingly. When used, underline and/or 
capitalize the negative word. 

Undesirable: Which of the following is not cited as an accomplishment 
of the Kennedy administration? 

Desirable: Which of the following is NOT cited as an accomplishment 
of the Kennedy administration? 

12 
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Item Alternatives 

6. Make all alternatives plausible and attractive to the less knowledgeable 
or skillful student. 

Wliat process is most near-ly the opposite of photosynthesis? 

Undesirable Desirable 

a. Digestion a. Digestion 

b. Relaxation b. A ^similation 
"^c. Respiration '^c. L ^^piratian 

d. Exertion d. CatdboHsm 



7, Make the alternatives grammatically parallel with each other, and 
consistent with the stem. 

TJndesi*.able: What would do most to advance the application of atomic 
discoveries to medicine? 

*a. Standardized techniques for treatment of patients. 

b. Train the average doctor to apply radioactive treatments. 

c. Remove the restriction on the use of radioactive substances. 

a. Establishing hospitals staffed by highly trained radio^ 
active therapy specialists. 

Desirable: What would do most to adoance the application of atomic 
discoveries to medicine? 

*a. Development of standardized techniques for treatment 
of patients. 

b. Training of the average doctor in application of 
radioactive treatments. 

c. Removal of restriction on the use of radioactive 
substances . 

d. Addition of trained radioactive therapy specialists 
to hospital staffs. 



8. Make the alternatives mutually exclusive. 

Undesirable: The daily minimum required amount of milk that a 10 year 
old child should drink is 

a. 1-2 glasr^es. 

'^b. 2^3 glasses, 

'^c. 3-4 glasses 

d. at least 4 glasses. 

Desirable: What is the daily minimum required amount of milk a 10 year 
old child should drink? 

a. 1 glass. 

b. 2 glasses, 
'^c. 3 glasses. 

d. 4 glasses. 
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When possible, present alternatives in some logical order 
(e.g., chronological, most to least, alphabetical). 

At 7 a.m. two trucks leave a diner and travel north. One truck 
averages 42 miles per hour and the other truck averages 38 miles 
per hour. At what time will they he 24 miles apart? 



Undesirable Desirable 

a. 6 p.m. a. 1 a.m. 

h. 9 p.m. b. 6 a.m. 

c. 1 a.m. c. 9 a.m. 

*d. 1 p.m. *d. 1 p.m. 

e. 6 a.m. e. 6 p.m. 



10. Be sure there is only one correct or best response to the item. 

Undesirable: The two most desired characteristics in a classroom test 
are validity and 

a. precision. 
*b. reliability. 

c. objectivity. 
*d. consistency. 

Desirable: The two most desired characteristics in a classroom test 
are validity and 

a. precision. 

*b. reliability. 

c. objectivity . 

d, standardization. 



11. Make alternatives approximately equal in '.ength. 

Undesirable: The most general cause of low individual incomes in the 
United States is 

*a. lack of valuable productive services to sell. 

b. unwillingness to work. 

c. automation. 

d. inflation. 

Desirable: What is the most general cause of low individual incomes 
in the United States? 

*a. A lack of valuable productive services to sell. 

b. The population's overall unwillingness to work. 

c. The nation's increased reliance on automation. 

d. An increasing national level of inflation. 
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12. Avoid irrelevant clues such as grammatical structure, well known verbal 
associations or connections between stem and answer. 



Undesirable: A chain of islands is called an: 
(grammatical archipelago. 

&. penisula. 

c. continent. 

d. isthmus. 



Undesirable: The reliability of a test can be estimated by a coefficient 
(verbal of: 

association , 

clue) ^* measurement, 

''^i. correlation. 

c. testing. 

d. error. 



Undesirable: 
(connection 

between 

stem and 

answer 

clue) 



The\ 

a. 
b. 
*c. 
d. 



Ito which a water dam is built depends on 

theXlength of the reservoir behind the dam. 
the \)olime of water behind the dam. 
the fheicj htjof water behind the dam. 
the strength of the reinforcing wall. 



13. Use at least four alternatives for each item to lower the probability of 
getting the item correct by guessing. 



14. Randomly distribute the correct response among the alternative positions 
throughout the test having approximately the same proportion of 
alternatives a, b, c, d and e as the correct response. 



15. Use the alternatives "none of the above" and "all of the above" sparingly. 
When used, such alternatives should occasionally be used as the correct 
response. 
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TRUE-FALSE TEST ITEMS 

A true-false item can be written in one of three forms: simple, complex, 
or compound. Answers can consist of only two choices (simple), more than 
two choices (complex), or two choices plus a conditional completion 
response (compound). An exan^le of each type of true-false item follows: 

Sample True-False Item: Simple 
The acquisition of morality is a developmental process. True False 

Sample True-False Item: Complex 
The acquisition of morality is a developmental process. True False Opinion 

Sample True-False Item: Compound 

The acquisition of morality is a developmental process. 

If this statement is false^ what makes it false? True False 



Advantages in Using True-False Item s 

True-false items can provide . . . 

... the widest sampling of content or objectives per unit of testing time. 
... scoring efficiency and accuracy. 

... versatility in measuring all levels of cognitive ability* 
... highly reliable test scores. 

... an objective measurement of student achievement or ability. 



Limitations in Using True-False Items 



True-false items 



... incorporate an extremely high guessing factor. For simple true-false 
items, each student has a 50/50 chance of correctly answering the 
item without any knowledge of the item's content. 

... can often lead an instructor to write ambiguous statements due to 
the difficulty of writing statements which are unequivocally true 
or false. 

... do not discriminate between students of varying ability as well as 
other item types. 

^ ... can often include more irrelevant clues than do other item types. 

ERJC ... can often lead an instructor to favor testing of trivial knowledge. 



SUGGESTIONS FOR WRITING TRUE-FALSE TEST ITEMS 



Base urue-false items upon statements that are absolutely true or 
false, without qualifications or exceptions • 

Undesirable: Nearsightedness is hereditary in origin. 

Dasirable: Geneticists and eye specialists believe that the 
predisposition to nearsightedness is hereditary. 

Express the item statement as simply and as clearly as possible. 

Undesirable: When you see a highway with a marker that reads ^ 

"Interstate 80" you know that the construction and 
upkeep of that road is built and maintained by the 
state and federal government. 

Desirable: The construction and maintenance of interstate highways 
is provided by both state and federal governments. 

Express a single idea in each test item. 

Undesirable: Water will boil at a higher temperature if the atmospheric 
pressure on its surface is increased and more heat is 
applied to the container. 

Desirable: Water will boil at a higher temperature if the atmospheric 
pressure on its surface is increased. 

and/or 

Water will boil at a higher temperature if more heat is 
applied to the container. 

Include enough background information and qualifications so that the 
ability to respond correctly to the item does rot depend on some 
special, uncommon knowledge. 

Undesirable: The second principle of education is that the individual 
gathers knowledge. 

Desirable: According to John Dewey ^ the second principle of 

education is that the individual gathers knowledge. 

Avoid lifting statements from the text, lecture or other materials 
so that memory alone will not permit a correct answer. 

Undesirable: For every action there is an opposite and equal reaction. 

Desirable: If you were to stand in a canoe and throw a life jacket 
forward to another canoe y chances are your canoe would 
jerk backward. 
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6. Avoid using negatively stated item statements. 

Undesirable: The Supreme Court is not composed of nine justices. 
Desirable: The Supreme is composed of nine justices. 

7. Avoid the use of unfamiliar vocabulary. 

Undesirable: According to some politicians y the raison d^etve for 
capital punishment is ret-nihution. 

Desirable: According to some politicians^ justification for the 
existence of capital punishment is retribution* 

8. Avoid tae use of specific determiners which would permit a test-wise 
but unprepared examinee to respond correctly. Specific determiners 
refer to sweeping terms like "all," "always," "none," "never," 
"impossible," "inevitable," etc. Statements including such terms 
are likely to be false. On the other hand, statements using 
qualifying determiners such as "usually," "sometimes," "often," etc., 
are likely to be true. When statements do require the use of specific 
determiners, make sure they appear in both true and false items. 

Undesirable: (All)sessions of Congress aie called by the President. (F) 



The Supreme Court is (frequ entl^ required to rule on the 
constitutionality of a lauT. (T) 



An objective test is (generally ) easier to score than an 
essay test. (T) 

Desirable: (When specific determiners are used reverse the 
expected outcomes.) 

The sum of the angles of a triangle is (alway^ 180^ , (T) 

Eac h mole cule of a given compound is chemically the same 
as (every) other molecule of that compound. (T) 



The galvanometer is the instrument( ^ually) used for the 
metering of electrical energy used in a home. (F) 



9. False items tend to discriminate more highly than true items. Therefore, 
use more false items than true items (but no more than 15% additional 
false items). 
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MATCHING TEST ITEMS 

In general, matching items consist of a column of stimuli presented on the 
left side of the exam page and a column of responses placed on the right 
side of the page. Students are required to match the response associated 
with a given stimulus. For example, 



Sample Matching Test Item 

Directions: On the line to the left of each factual statement ^ write the 
letter of the principle which bests explains the statement's 
occurenae. Each principle may be used more than once. 



Factual Statements 



Fossils of primates first appear 
in the Cenozoic rock strata^ 
while trilobite remains are 
found in the Protorozoic rocks. 

The Arctic and Antarctic regions 
are sparsely populated. 

Plants have no nervous cystem. 

Large coal beds exist in 
Alaska. 



Principles 

a. There have been profound 
changes in the climate on 
earth. 

b. Coordination arid integration 
of action is generally slower 
in plants than in animals. 

c. There is an increasing com- 
plexity of structure and 
functions from lower to 
higher forms of life. 

d. All life comes from life 
and produces its own kind 
of living organisms. 

e. Light is a limiting factor 
to life. 



Advantages in Usirvg^ Matching Items 
Matching items . . . 

... require short periods of reading and response time, allowing you to 
cover more content. 

... provide objective measurement of student achievement or ability. 

... provide highly reliable test scores. 

... provide scoring efficiency and accuracy. 

Limitations in Using Matching Items 

Matching items . . . 

... have difficulty measuring learning objectives requiring more than 
simple recall of information. 

... are difficult to construct due to the problem of selecting a 
common set of stimuli and responses. 
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supdGestions for writing matching test items 

1. Include directions which clearly state the basis for matching the 
stimuli with the responses. Explain whether or not a response can 
be used more than once and indicate where to write the answer. 

Undesirable: Directions: Match the following. 

Desirable: Directions: On the line to the left of each identifying 

location and characteristics in Colimtn Ij 
write the letter of the country in CoU/m II 
that is best defined. Each country in 
Colwm II may he used more than once. 



2. Use only homogeneous material in matching items. 
Undesirable: Directions: Match the following. 



1. 


Water 


A. 


Nad 


2. 


Discovered Radivm 


B. 


Fermi 


3. 


Salt 


C. 




4. 


Year of the let Nuclear 


D. 






Fieeion by Man 


E. 


1942 


S. 


Ammonia 


F. 


Curie 



Desirable: Directions: On the line to the left of each compound in 

Column I J write the letter of the compound's 
formula presented in Column II. Use each 
formula only once. 



Column I 



Column II 



1. Water 

2. Salt 

3. Ammonia 

4. Sulfuric Acid 



A. H^O^ 

B. HCl 

C. NaCl 
D. 
E. 



H^HCl 
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3. Arrange the list of responses in some systematic order if possible 
(e.g., chronological, alphabetical) . 

Directions: On the line to the left of each definition in Column I, 
write the letter of the defense mechanism in Colvam II 
that is described. Use each defense mechanism only once. 



Column I 
1 



Hunting for reasons to 
support one's beliefs. 

Accepting the values and 
norms of others as one 's 
own even when they are 
contrary to previously 
held values. 

Attributing to others 
one's own unacceptable 
impulses^ thoughts and 
desires . 

Ignoring disagreeable 
situations y topics y 
sights. 



Undesirable Desirab.' ^ 

Column II 



a. 


Rationalization 


a. 


Denial oj reality 


b. 


Identification 


b. 


Identification 


c. 


Projection 


c. 


Introjection 


d. 


Introjeccion 


d. 


Projection 


e. 


Denial of 
Reality 


e. 


Rationalization 



Avoid grammatical or other clues to the correct response. 

Undesirable: Directions: Match the following in order to complete 

the sentences on the left. 



1. Igneous rocks are formed 

2. The formation of coal 

requires 

3. A geode is filled 

4. Feldspar is classified as 



A. a hardness of 7. 

B. with crystalline 
rock. 

C. a metamorphic rock. 

D. heat and pressure. 

E. through the solid- 
ification of molten 
lava. 



Desirable: Avoid sentence completion due to grammatical clues. 
5. Keep matching items brief, limiting the list of stimuli to under 10. 



6. Include more responses than stimuli to help prevent ant ^ering through 
the process of elimination. 
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IThen possible, reduce the amount of reading time by including only short 
phrases or single words in the response list. 
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18. 

COMPLETION T^^.ST ITEMS 

The completion item requires ^ue student to answer a question or to finish 
an incomplete statement by filling In a blank with tha correct word or 
phrase. For example, 

Sample Completion I *. em 

According to Fveudy personality is made of inrec major systems^ the 
J the and the . 



Advantages in Using Completion Items 
Completion items ... 

... can provide a wide sampling of content. 

... can efficiently measure lower levels of i^ognitive ability. 

... can minimize guessing as compared to multiple-choice or true- 
false items. 

... can usually provide an objective me^surr of student achievement 
or ability. 



LimLtations in Using Completion Items 

Completion items ... 

. . . are difficult to construct so that the desired response is 
clearly indicated. 



... nave difficulty measuring learning objective^ requiring more than 
simple recall of information. 

... can often include more irrelevant clues than do other item types. 

. .. are moro time consuming to score when compared to multi;^le-choice 
or true-false items. 



are more difficult to score since more than one answer may have to 
be considered correct if the item was not properly prepared. 
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SUGGESTIONS FOR WRITING COMPLETION TEST ITEMS 

!• Omit only significant words from the statement. 

Undesirable: Every atom has a central (core) called a nucleus. 
Desirable: Every atom has a central core called a(n) (nucleus) . 

2. Do not omit so many words from the statement that the intended meaning 
is lost. 

Undesirable: The were to Egypt as the were to 

Persia and as were to the early tribes of Israel, 

Desirable: The Pharaohs were to Egypt as the were to 

Persia and as were to the early tribes of Israel. 

3. Avoid grammatical or other clues to the correct response. 

Undesirable: Most of the United States^ libraries are organized 
according to the (Dewey) decimal system. 

Desirable: Which organizational system is used by most of the 
United States^ libraries? (Dewey decimal) 

A. Be sure there is only one correct lasponse. 

Undesirable: Trees which shed their leaves annually are (seed-bearing ^ 
common) . 

Desirable: Trees which shed their leaves annually are called (deciduous) . 

5. Make the blanks of equal length. 

Undesirable: In Greek mythology^ Vulcan was the son of (Jupiter) 

and (Juno) . 

Desirable: In Greek mythology^ Vulcan was the son of (Jupiter) and 
(Juno) 

6. When possible, delete words at the end of the statement after the 
student has been presented a clearly defined problem. 

Undesirable: (122. 5) is the molecular weight of KCIO^. 

Desirable: The molecular weight of KCIO^ is (122. S) . 
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7. Avoid lifting statements directly from the text, lecture or other 
sources . 



8. Limit the required response to a single word or phrase. 
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ESSAY TEST ITEMS 

The essay test is probably the most popular of all types of tecicher-made 
tests. In general, a classroom essay test consists of a small number of 
questions to which the student is expected to demonstrate his/her ability 
to (a) recall factual knowledge, (b) organize this knowledge and (c) 
present the knowledge in a logical, integrated answer to the qu'^'^nion. 
An essay test item can be classified as either an extended-response essay 
item or a short-answer essay item. The latter calls for a more restricted 
or limited answer in terms of form or scope. An example of each type of 
essay item follows. 

Sample Extended-Response Essa> Item 

Explain the difference between the S-R (Stimulus-Response) and the S-O-R 
(Stimulus-Organism-Response) theories of personality. Include in your 
answer (a) brief descriptions of both theories^ (b) supporters of both 
theories and (c) research methods used to study each of the two theories. 
(10 pts. 20 minutes) 

Sample Short-Answer Essay Item 

Identify research methods used to study the S-R (Stimulus-Response) and 
S'O-R (Stimulus-Response-Organism) theories of personality, 
(b pts. 20 minutes) 

Advantages in Using Essay It *ms 
Essay items . . . 

... are easier and less time consuming to construct than are most other 
item types. 

... provide a means for testing student's ability to compose an 
answer and present it in a logical manner. 

... can efficiently measure higher order cognitive objectives (e.g., 
analysis, synthesis, evaluation) . 

Limitations in Using Essay Items 
Essay items . . . 

... cannot measure a large amount of content or objectives. 

... generally provide low test and test scorer reliability. 

... require an extensive amount of instructor's time to read and grade. 

... generally do not provide an objective measure of student achievement 
or ability (subject to bias on the part of the grader). 
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SUGGESTIONS FOR WRITING ESSAY TEST ITEMS 



1. Prepare essay items that elicit the type of behavior you want to measure. 

Learning Objective: The student will be able to explain how the normal 

curve serves as a statistical nodel. 

Undesirable: Describe a normal curve in terms of: syrmetryy modality, 
kurtosis and skcvness. 

Desirable: Briefly explain how the normal curve serves as a 

statistical model for estimation and hypothesis testing. 



2. ?hrase each item so that the student's task is clearly indicated. 

Undesirable: Discuss the economic factors which led to the stock 
market crash of 2929. 

Desirable: Identify the three major economic conditions which led 
to the stocK market crash of 1929. Discuss briefly each 
condition in correct chronological sequence and in one 
paragraph indicate how the three factors were inter- 
related. 



3. Indicate for each item a point value or weight and an estimated time 
limit for answering. 

Undesirable: Compare the writings of Bret Uarte and Mark Twain 

in terms of settings^ depth of characterization ^ and 
dialogue styles of their main characters- 

Desirable: Compare the writing of Bret Harte and Mark Twain 

in terms of settings, depth of characterization and 
dialogue styles of their main characters. 
(20 points PO minutes) 



A. Ask questions that will elicit responses on which experts could agree 
that one answer is better than another. 



5. Avoid giving the student a choice among optional items as this 
greatly reduces the reliability of the test. 



6. It is generally recommended for classroom examinations to administer 
saveral short-answer items rather than only one or two extended- 
response items. 
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SUGGESTIONS FOR SCORING ESSAY ITEMS 



1. Choose a scoring model. Two of the more common scoring models are 
ANALYTICAL SCORING and GLOBAL QUALITY • 

ANALYTICAL SCORNING : Each answer is compared to an ideal answer and points 

are assigned for the inclusion of necessary elements. 
Grades are based on the number of accumulated points 
either absolutely (i.e., A=10 or more points, B=*6-9 pts., 
etc.) or relatively (A=top 15% scores, B=next 30% of 
scores, etc.) 

GLOBAL QUALITY ; Each answer is read and assigned a score (e.g., grade, 

total points) based either on the total quality of 
the response or on the total quality of the response 
relative to other student answers. 

Example Essay Item and Grading Models 

"Americans are a mixed-up people with no sense of ethical values. 
Everyone knows that baseball is far less necessary than food and 
steely yet they pay ball players a lot more than farmers and 
steelworkers . " 

WHY? Use 3-4 sentences to indicate how an economist would explain 
the above situation 

Analytical Scoring 

Necessary Elements to be Included in Response Points 

Salaries are based on demand relative to supply of 

such services. S 

Excellent ball players are rare. 2 

Ball clubs have a high demand for excellent players. 2 

Clarity of Response 2 

9 pts. 

G lobal Quality 

Assign scores or grades on the overall quality of the written response as 
compared to an ideal answer. Or^ compare the overall quality of a response 
to other student responses by sorting the papers into three stacks: 

Below Aveiage Average Above Average 

Read, and sort each stack again and divide into three more stacks. 

Below Average Average Jibove Average 

Below Avg. Aoove Beloi') Avg. Above Below Avg. Above 

Avg. Avg. Avj. Avg. Avg. Avg. 

In totaly nine discriminations can be used to assign test grades in this manner. 
The number of stacks or discriminations can vary to meet your needs. 
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2* Try not to allow factors which are irrelevant to the learning outcomes 
being measured affect your grading (i.e., handwriting, spelling, 
neatness) . 



3. Read and grade all class answers to one item before going on to 
the next item. 



4. Read and grade che answers without looking at the students' names 
to avoid possible preferential treatment. 



5. Occasionally shuffle papers during the reading of answers to help 
avoid any systematic order effects (i.e., Sally's "B" work always 
followed Jim's "A" work thus it looked more like "C" work). 



6. When possible, ask another instructor to read and grade your students' 
responses. 
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PROBLEM S.-VING TEST ITEMS 

Another form of a subjective test item is the problem solving or computa- 
tional exam question. Such items present the student with a problem 
situation or task and require a demonstration of work procedures and a 
correct solution, or just a correct solution. This kind of test item is 
classified as a subjective type of item due to the procedures used to 
score item responses. Instructors can assign full or partial credit to 
either correct or incorrect solutions depending on the quality and kind 
of work procedures presented. An example of a problem solving test item 
follows . 

Example Problem Solving Test Item 

It was calculated that 75 men could complete a strip on a new highway in 70 
days. When work was scheduled to commence y it was found necessary to send 
25 men on another road project. How many days longer will it take to 
complete the strip? Show your work for full or partial credit. 

Advantages in Using Problem Solving Ite ms 

Problem solving items ... 

... minimize guessing by requiring the students to provide an original 
response rather than to select from several alternatives. 

... are easier to construct than are multiple-choice or matching items. 

... can most appropriately measure learning objectives which focus on 

the ability to apply skills or knowledge in the solution of problems. 

... can measure an extensive amount of content or objectives. 

Limitations in Using Problem Solving Items 

Problem solving items ... 

... generally provide low test and test scorer reliability. 

... require an extensive amount of instructor time to read and grade. 

... generally do not provide an objectiv*=^. measure of student achievement 
or ability (subject to bias on the part of the grader when partial 
credit is given). 
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SUGGESTIONS FOR WRITING PROBLEM SOLVING TEST ITEMS 



!• Clearly identify and explain the problem. 



Undesirable: 



Desirable: 



During a car crash^ the car slows down at the rate of 
490 m/sec^. }fhat is the magnitude and direction of the 
force acting on a lOO-^kg driver? 

During a ca"^ crash, the car slows dawn at the rate of 
490 m/sec^. Using the car as a frame of reference, what 
is the magnitude Tnd direction of the gram force acting 
on a lOO-'kg driver? 



2* Provide directions which clearly inform the student of the type of 
response called for. 

Undesirable: An American tourist in Paris finds that he weighs 70 

kilograms. }/hen he left the United States he weighed 
144 pounds. What was his net change in weight? 

Desirable: An Americar tourist in Paris finds that he weighs 70 

kilograms. When he left the United States he weighed 

144 pounds, f/hat was his net weight change in pounds? 



State in the directions whether or not the student must show his/her 
work procedures for full or partial credit. 



Undesirable: 



Desirable: 



A double concave lens is made of glass with n = 1.50. 
If the radii of curvature of the two lens surfaces are 
both 30.0 cm, what is the focal length of the lens? 

A double concave lens is made of glass with n = 1.50. 
If the radii of curvature of the two lens surfaces are 
both 30.0 cm, what is the focal length of the lens? 
Show your work to receive full or partial credit. 
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A, Clearly separate item parts and indicate their point values, 

A man leaves his home and drives to a convention at an average rate of 
50 miles per hour. Upon arrival, he finds a telegram advising him to 
return at once. He catches a plane that takes him back at an average 
rate of 300 miles per hour. 

Undesirable: If the total traveling time was 1 3/4 hours, how long 

did it take him to fly back? How far from his home was 
the convention? 

Desirable: If the total traveling time was 1 3/4 hours: 

(1) How long did it take him to fly back? (1 pt.) 

(2) How far from his home was the convention? (1 pt.) 
Show your work for full or partial credit. 

30 



Use figures, conditions and situations which create a realistic problem. 

Undesirable: An automobile weighing 2^840 N (about 640 pounds) 
is traveling at a speed of 300 miles per hour* 
What is the car's kinetic energy? Show your work. (2 pts.) 

Desirable; An automobile weighing 14^200 N (about S200 poimds) 

is traveling at a speed of 12m/sec. What is the car's 
kinetic energy? Show your work. (2 pts.) 

Ask questions that elicit responses on which experts could agree that 
one solution and one or more work procedures are better than others. 

Work through each problem before classroom administration to double- 
check accuracy. 
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PERFORMANCE TEST ITEMS 

A performance test item is designed to assess the ability of a student to 
perform correctly in a simulated situation (i.e., a situation in which the 
student will be ultimately expected to apply his/her learning) . The 
concept of simulation is central in performance testing; a performance 
test will simulate to some degree a real life situation to accomplish 
the assessment. In theory, a performance test could be constructed for 
any skill and real life situation. In practice, most performance tests 
have been developed for the assessment of vocational, managerial, 
administrative, leadership, communication, interpersonal and physical 
education skills in various simulated situations. An illustrative example 
of a performance test item is provided below. 



Sample Performance Test Item 

Assume that some of the instructional objectives of an urban planning course 
include the development of the student's ability to effectively use the 
principles covered in the course in various "real life" situations common 
for an urban planning professional. A performance test item could measure 
this development by presenting the student with a specific situation which 
represents a "real life" situation. For example. 

An urban planning board makes a last minute request for the professional 
to act as consultant and critique a written proposal which is to be con- 
sidered in a board meeting that very evening. The professional arrives 
before the meeting and has one hour to analyze the written proposal and 
prepare his critique. The critique presentation is then made verbally 
during the board meeting; reactions of members of the board or the audience 
include requests for explanation of specific points or informed attacks on 
the positions taken by the professional. 

The performance test designed to simulate this situation would require that 
the student to be tested role play the professional's part^ while students 
or faculty act the other roles in the situation. Various aspects of the 
^'professional's^' performance would then be observed and rated by several 
judges with the necessary background. The ratings could then be used both 
to provide the student with a diagnosis of his/her strengths and weaknesses 
and to contribute to an overall summary evaluation of the student* s 
abilities. 



o 32 

ERIC 



29. 



Advantages in Using Performance Test Items 
Performance test items ... 

... can most appropriately measure learning objectives which focus 
on the ability of the students to apply skills or knowledge in 
real life situations. 

... usually provide a degree of test validity not possible with 
standard paper and pencil test items. 

... are useful for measuring learning objectives in the psychomotor 
domain. 



Limitations in Using Perf ormar ".e Test Items 
Performance test items . . . 

... are difficult and time consuming to construct. 

... are primarily used for testing students individually and not for 
testing groups. Consequently, they are relatively costly, time 
consuming, and inconvenient forms of testing. 

generally provide low test and test scorer reliability. 

... generally do not provide an objective measure of student achievement 
or ability (subject to bias on the part of the observer/grader). 



SUGGESTIONS FOR WRITING PERFORMANCE TEST ITEMS 

1. Prepare items that elicit the type of behavior you want to measure. 

2. Clearly identify and explain the simulated situation to the student. 

3. Make the simulated situation as '*lif e-like*' ^s possible. 

4. Provide directions which clearly inform the students of the type of 
response called for. 

5. When appropriate, clearly state time and activity limitations in the 
directions. 

6. Adequately train the observer(s)/scorer(s) to ensure that they are 
fair in scoring the appropriate behaviors. 
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30. 

III. TWO METHODS FOR ASSESSING TEST ITEM QUALITY 



This section of the booklet presents two methods for collecting feedback on 
the quality of your test items* The two methods include using self -review 
checklists and student evaluation of test item quality. You can use the 
information gathered from either method to identify strengths and weaknesses 
in your item writing. 

CHECKLIST FOR EVALUATING TEST ITEMS 

EVALUATE YOUR TEST ITEMS BY CHECKING THE SUGGESTIONS WHICH YOU FEEL YOU HAVE 
FOLLOWED . 

Multiple-Choice Test Items 

When possible, stated the stem as a direct question rather than as an 

incomplete statement. 

Presented a definite, explicit and singular question or problem in the 

stem. 

Eliminated excessive verbiage or irrelevant information from the stem. 

Included in the stem any word(s) that might have otherwise been repeated 

in each alternative. 

Used negatively stated stems sparingly. When used, underlined and/or 

capitalized the negative word(s). 

Made all alternatives plausible and attractive to the less knowledgeable 

or skillful student. 

Made the alternatives grammatically parallel with each other, and 

consistent with the stem. 

Made the alternatives mutually exclusive. 

When possible, presented alternatives in some logical order (e.g., 

chronologically, most to least). 

Made sure there was only one correct or best response per item. 

Made alternatives approximately equal in length. 

Avoided irrelevant clues such as grammatical structure, well known 

verbal associations or connections between stem and answer. 

Used at least four alternatives for each item. 

Randomly distributed the correct response among the alternative positions 

throughout the test having approximately the same proportion of alternatives 
a, b, c, d, and e as the correct response. 

Used the alternatives "none of the above" and "all of the above" sparingly. 

When used, such alternatives were occasionally the correct response. 
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True-False Test Items 

Based true-false items upon statements that are absolutely true or 
false, without qualifications or exceptions. 

Expressed the item statement as simply and as clearly as possible. 
Expressed a single idea in each test item. 

Included enough background information and qualif ica^.ions so that the 
ability to respond correctly did not depend on some special, uncommon 
knowledge- 

Avoided lifting statements from the text, lecture or othsr mat^.rials. 
Avoided using negatively stated item statements. 
Avoided thp use of unfamiliar language. 

Avoided the use of specific determiners such as "all,'' "always," 
"none," "never," etc., and qualifying determiners such as "usually," 
"sometimes," "often," etc.. 

Used more false items than true items (but not more than 15% 
additional false items). 



Matching Test Items 

Included directions which clearly stated the basis for matching the 
stimuli with the response. 

Explained whether or not a response could be used more than once and 
indicated where to wr^-e the answer. 

Used only homogeneous material. 

When possible, arranged the list of responses in some systematic 
order (e.g. , chronologically, alphabetically) . 

Avoided grammatical or other clues to the correct response. 

Kept items brief (limited the list of stimuli to under 10). 

Included more responses than stimuli. 

When possible, reduced the amount of reading tiine by including only 
short phrases or single words in the response list. 
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Completion Test Items 

Omitted only significant words from the statement. 

Did not omit so many words from the statement that the intended 
meaning was lost. 

Avoided grammatical or other clues to the correct response. 
Included only one correct response per item. 
Made the blanks of equal length. 

VHien possible, deleted the words at the end of the statement after 
the student was presented with a clearly defined problem. 

Avoided lifting stat ments directly from the text, lecture or other 
sources . 

Limited the required response to a single word or phrase. 

Essay Test Items 

Prepared items that elicited the type of behavior you wanted to measure. 

Phrased each item so that the student's task was clearly indicated. 

Indicated for each item a point value or weight and an estimated time 
limit for answering. 

Asked questions that elicited responses on which experts could agree 
that one answer is better than others. 

Avoided giving the student a choice among optional items. 

Administered several short-answer items rather than 1 or 2 extended- 
response items. 

Grading Essay Test Items 
Selected an appropriate grading model. 

Tried not to allow factors which were irrelevant to the learning 
outcomes being measured to affect your grading (e.g., handwriting, 
spelling, neatness) . 

Read and graded all class answers to one item before going on to the 
next item. 
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Read and graded the answers without looking at the student's name to 
avoid possible preferential treatment* 



Occasionally shuffled papers during the reading of answers* 

When possible, asked another instructor to read and grade your 
students' responses • 

Problem Solving Test Items 

Clearly identified and explained the problem to the student. 

Provided directions which clearly informed the student of the type 
of response called for. 

Stated in the directions whether or not the student must show work 
procedures for full or partial credit • 

Clearly separated item parts and indicated their point values. 

Used figures, conditions and situations which created a realistic 
problem. 

Asked questions that elicited responses on which experts could 
agree that one solution and one or more work procedures are better 
than others. 

Worked through each problem before classroom administration. 



Performance Test Items 

Prepared items that elicit the type of behavior you wanted to measure. 

Clearly identified and explained the simulated situation to the student. 

Made the simulated situation as "life-like" as possible. 

Provided directions which clearly inform the students of the type of 
response called for. 

When appropriate, clearly stated time and activity limitations in the 
directions. 

Adequately trained the observer (s) /scorer (s) to ensure that they were 
fair in scoring the appropriate behaviors. 
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STUDENT EVALUATION OF TEST ITEM QUALITY 

USING ICES QUESTIONNAIRE ITEMS 
TO ASSESS YOUR TEST ITEM QUALITY 



The following set of ICES (Instructor and Course Evaluation System) questionnaire 
items can be used to assess the quality of your test items. The items are 
presented with their original ICES catalogue number. You are encouraged to 
include one or more of the items on the ICES evaluation form in order to collect 
student opinion of your item writing quality. 



102 — How would you rate the instructor's 
examination questions? 
Excellent Poor 



116 — Did the exams challenge you to do 
original thinking? 

Yes, very No, not 

cha 1 len g in g cha 1 1 en g in g 



103 — How well did examination questions 
reflect content and emphasis of 
the course? 

Well Poorly 
related related 



118 — Were there "trick" or trite 
questions on tests? 

Lots of Few if 

them any 



114 — The exams reflected important 

points in the reading assignments. 
Strongs ^ Strongly 
agree disagree 



117 — Examinations mainly tested 
trivia. 

Strongly Strongly 

agree disagree 



119 — Were exam questions worded clearly? 
Yes, very No, very 

clear unclear 



122 — How difficult were the examinations? 
Too Too 
difficult easy 



123 — I found I could score reasonably 

well on exams by just cramming. 
Strongly Strong y 

agree disagree 



121 — How was the length of exams for 
the time allotted. 

Too Too 
long short 



115 — Were the instructor's test 
questions thought provoking? 
Definitely Definitely 
yes no 



109 — Were exams, papers, reports returned 
with errors explained or personal 
comments? 

Almost Almost 
always never 



125 — Were exams adequately discussed 
upon return? 

Yes, llo, 
adequately not enough 
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IV. ASSISTANCE OFFERED BY THE OFFICE OF 
INSTRUCTIONAL AND MANAGEMENT SERVICES (IMS) 

The information in the booklet is intended for self-instruction. 
However, IMS staff members will consult with faculty who wish to analyze 
and improve t'ueir test item writing. The staff can also consult with 
faculty about other instructional problems. The Measurement and Evaluation 
Division of IMS also publishes a quarterly newsletter called THE ANSWER 
SHEET which discusses various classroom testing and ipeasurement ijssues. 
Instructors wishing to receive the newsletter or to acquire IMS ajssistance 
can call the Measurement and Evaluation Division at 333-3A90. 
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